WWW Spiders: an introduction
نویسنده
چکیده
In recent years, the study of complex networks has received a lot of attention. Real systems, including information networks and relationships between persons and users, have gained importance in scientific publications, despite of an important drawback: the difficulty of retrieving and manage such great quantity of information. This paper wants to be an introduction to the construction of spiders and scrapers: specifically, how to program and deploy safely these kind of software applications. The aim is to show how software can be prepared to automatically surf the net and retrieve information for the user with high efficiency and safety.
منابع مشابه
Information retrieval on Internet using meta-search engines: A review
Introduction Though automatic information retrieval (IR) existed before World Wide Web (WWW), post-Internet era has made it indispensable. IR is sub field of computer science concerned with presenting relevant information, gathered from online information sources to users in response to search queries. Various types of IR tools have been created, solely to search information on Internet. Apart ...
متن کاملAnalysing Users WWW Search Behaviour
In a recent study [1], Internet users ranked search as their most important activity, awarding it a 9.1 on a 10-point scale. The next most important activity ranked only 6.3. Internet search engines are continually updating their indexes, and scaling up their parallel processors to keep up with the growth of the WWW. It is estimated that there are 800 million indexable pages in the WWW [2], and...
متن کاملLost in Hyperspace? Free Text Searches in the Web
The World Wide Web (WWW) [LCG92] is a distributed hypermedia system for information discovery, retrieval, and collaboration. The hypertext paradigm has proven its usefulness for browsing large, distributed document structures. The ease of use provided by this paradigm is one of the reasons for the great popularity which the World Wide Web has gained through the last months. However, as the amou...
متن کاملComparison of Three Vertical Search Spiders
T he Web has plenty of useful resources, but its dynamic, unstructured nature makes them difficult to locate. Search engines help, but the number of Web pages now exceeds two billion, making it difficult for generalpurpose engines to maintain comprehensive, up-todate search indexes. Moreover, as the Web grows ever larger, so does information overload in query results. A general-purpose search e...
متن کاملComparison of Three Vertical Search
T he Web has plenty of useful resources, but its dynamic, unstructured nature makes them difficult to locate. Search engines help, but the number of Web pages now exceeds two billion, making it difficult for generalpurpose engines to maintain comprehensive, up-todate search indexes. Moreover, as the Web grows ever larger, so does information overload in query results. A general-purpose search e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0710.5054 شماره
صفحات -
تاریخ انتشار 2007